Benefiting from the intrinsic supervision information exploitation capability, contrastive learning has achieved promising performance in the field of deep graph clustering recently. However, we observe that two drawbacks of the positive and negative sample construction mechanisms limit the performance of existing algorithms from further improvement. 1) The quality of positive samples heavily depends on the carefully designed data augmentations, while inappropriate data augmentations would easily lead to the semantic drift and indiscriminative positive samples. 2) The constructed negative samples are not reliable for ignoring important clustering information. To solve these problems, we propose a Cluster-guided Contrastive deep Graph Clustering network (CCGC) by mining the intrinsic supervision information in the high-confidence clustering results. Specifically, instead of conducting complex node or edge perturbation, we construct two views of the graph by designing special Siamese encoders whose weights are not shared between the sibling sub-networks. Then, guided by the high-confidence clustering information, we carefully select and construct the positive samples from the same high-confidence cluster in two views. Moreover, to construct semantic meaningful negative sample pairs, we regard the centers of different high-confidence clusters as negative samples, thus improving the discriminative capability and reliability of the constructed sample pairs. Lastly, we design an objective function to pull close the samples from the same cluster while pushing away those from other clusters by maximizing and minimizing the cross-view cosine similarity between positive and negative samples. Extensive experimental results on six datasets demonstrate the effectiveness of CCGC compared with the existing state-of-the-art algorithms.
translated by 谷歌翻译
Knowledge graph reasoning (KGR), aiming to deduce new facts from existing facts based on mined logic rules underlying knowledge graphs (KGs), has become a fast-growing research direction. It has been proven to significantly benefit the usage of KGs in many AI applications, such as question answering and recommendation systems, etc. According to the graph types, the existing KGR models can be roughly divided into three categories, \textit{i.e.,} static models, temporal models, and multi-modal models. The early works in this domain mainly focus on static KGR and tend to directly apply general knowledge graph embedding models to the reasoning task. However, these models are not suitable for more complex but practical tasks, such as inductive static KGR, temporal KGR, and multi-modal KGR. To this end, multiple works have been developed recently, but no survey papers and open-source repositories comprehensively summarize and discuss models in this important direction. To fill the gap, we conduct a survey for knowledge graph reasoning tracing from static to temporal and then to multi-modal KGs. Concretely, the preliminaries, summaries of KGR models, and typical datasets are introduced and discussed consequently. Moreover, we discuss the challenges and potential opportunities. The corresponding open-source repository is shared on GitHub: https://github.com/LIANGKE23/Awesome-Knowledge-Graph-Reasoning.
translated by 谷歌翻译
Graph contrastive learning is an important method for deep graph clustering. The existing methods first generate the graph views with stochastic augmentations and then train the network with a cross-view consistency principle. Although good performance has been achieved, we observe that the existing augmentation methods are usually random and rely on pre-defined augmentations, which is insufficient and lacks negotiation between the final clustering task. To solve the problem, we propose a novel Graph Contrastive Clustering method with the Learnable graph Data Augmentation (GCC-LDA), which is optimized completely by the neural networks. An adversarial learning mechanism is designed to keep cross-view consistency in the latent space while ensuring the diversity of augmented views. In our framework, a structure augmentor and an attribute augmentor are constructed for augmentation learning in both structure level and attribute level. To improve the reliability of the learned affinity matrix, clustering is introduced to the learning procedure and the learned affinity matrix is refined with both the high-confidence pseudo-label matrix and the cross-view sample similarity matrix. During the training procedure, to provide persistent optimization for the learned view, we design a two-stage training strategy to obtain more reliable clustering information. Extensive experimental results demonstrate the effectiveness of GCC-LDA on six benchmark datasets.
translated by 谷歌翻译
Graph anomaly detection (GAD) is a vital task in graph-based machine learning and has been widely applied in many real-world applications. The primary goal of GAD is to capture anomalous nodes from graph datasets, which evidently deviate from the majority of nodes. Recent methods have paid attention to various scales of contrastive strategies for GAD, i.e., node-subgraph and node-node contrasts. However, they neglect the subgraph-subgraph comparison information which the normal and abnormal subgraph pairs behave differently in terms of embeddings and structures in GAD, resulting in sub-optimal task performance. In this paper, we fulfill the above idea in the proposed multi-view multi-scale contrastive learning framework with subgraph-subgraph contrast for the first practice. To be specific, we regard the original input graph as the first view and generate the second view by graph augmentation with edge modifications. With the guidance of maximizing the similarity of the subgraph pairs, the proposed subgraph-subgraph contrast contributes to more robust subgraph embeddings despite of the structure variation. Moreover, the introduced subgraph-subgraph contrast cooperates well with the widely-adopted node-subgraph and node-node contrastive counterparts for mutual GAD performance promotions. Besides, we also conduct sufficient experiments to investigate the impact of different graph augmentation approaches on detection performance. The comprehensive experimental results well demonstrate the superiority of our method compared with the state-of-the-art approaches and the effectiveness of the multi-view subgraph pair contrastive strategy for the GAD task.
translated by 谷歌翻译
Recently, graph anomaly detection has attracted increasing attention in data mining and machine learning communities. Apart from existing attribute anomalies, graph anomaly detection also captures suspicious topological-abnormal nodes that differ from the major counterparts. Although massive graph-based detection approaches have been proposed, most of them focus on node-level comparison while pay insufficient attention on the surrounding topology structures. Nodes with more dissimilar neighborhood substructures have more suspicious to be abnormal. To enhance the local substructure detection ability, we propose a novel Graph Anomaly Detection framework via Multi-scale Substructure Learning (GADMSL for abbreviation). Unlike previous algorithms, we manage to capture anomalous substructures where the inner similarities are relatively low in dense-connected regions. Specifically, we adopt a region proposal module to find high-density substructures in the network as suspicious regions. Their inner-node embedding similarities indicate the anomaly degree of the detected substructures. Generally, a lower degree of embedding similarities means a higher probability that the substructure contains topology anomalies. To distill better embeddings of node attributes, we further introduce a graph contrastive learning scheme, which observes attribute anomalies in the meantime. In this way, GADMSL can detect both topology and attribute anomalies. Ultimately, extensive experiments on benchmark datasets show that GADMSL greatly improves detection performance (up to 7.30% AUC and 17.46% AUPRC gains) compared to state-of-the-art attributed networks anomaly detection algorithms.
translated by 谷歌翻译
在本文中,我们研究了组合半伴侣(CMAB),并专注于减少遗憾的批量$ k $的依赖性,其中$ k $是可以拉动或触发的武器总数每个回合。首先,对于用概率触发的臂(CMAB-T)设置CMAB,我们发现了一个新颖的(定向)触发概率和方差调制(TPVM)条件,可以替代各种应用程序的先前使用的平滑度条件,例如级联bandsistits bandits bandits。 ,在线网络探索和在线影响最大化。在这种新条件下,我们提出了一种具有方差感知置信区间的BCUCB-T算法,并进行遗憾分析,将$ O(k)$ actival降低到$ o(\ log k)$或$ o(\ log^2 k) )$在遗憾中,大大改善了上述申请的后悔界限。其次,为了设置具有独立武器的非触发CMAB,我们提出了一种SESCB算法,该算法利用TPVM条件的非触发版本,并完全消除了对$ k $的依赖,以备受遗憾。作为有价值的副产品,本文使用的遗憾分析可以将几个现有结果提高到$ O(\ log K)$的一倍。最后,实验评估表明,与不同应用中的基准算法相比,我们的表现出色。
translated by 谷歌翻译
我们研究了标准匪徒问题的扩展,其中有很多专家。多层专家按一层进行选择,只有最后一层的专家才能发挥作用。学习政策的目的是最大程度地减少该等级专家环境中的遗憾。我们首先分析了总遗憾随着层数线性增长的案例。然后,我们关注的是所有专家都在施加上层信心(UCB)策略,并在不同情况下给出了几个子线上界限。最后,我们设计了一些实验,以帮助对分层UCB结构的一般情况进行遗憾分析,并显示我们理论结果的实际意义。本文提供了许多有关合理层次决策结构的见解。
translated by 谷歌翻译
多视图聚类(MVC)最佳地集成了来自不同视图的互补信息,以提高聚类性能。尽管在各种应用中证明了有希望的性能,但大多数现有方法都直接融合了多个预先指定的相似性,以学习聚类的最佳相似性矩阵,这可能会导致过度复杂的优化和密集的计算成本。在本文中,我们通过对齐方式最大化提出了晚期Fusion MVC,以解决这些问题。为此,我们首先揭示了现有K-均值聚类的理论联系以及基本分区和共识之一之间的对齐。基于此观察结果,我们提出了一种简单但有效的多视算法,称为LF-MVC-GAM。它可以从每个单独的视图中最佳地将多个源信息融合到分区级别,并最大程度地将共识分区与这些加权基础分区保持一致。这种对齐方式有助于整合分区级别信息,并通过充分简化优化过程来大大降低计算复杂性。然后,我们设计了另一个变体LF-MVC-LAM,以通过在多个分区空间之间保留局部内在结构来进一步提高聚类性能。之后,我们开发了两种三步迭代算法,以通过理论上保证的收敛来解决最终的优化问题。此外,我们提供了所提出算法的概括误差约束分析。对十八个多视图基准数据集进行了广泛的实验,证明了拟议的LF-MVC-GAM和LF-MVC-LAM的有效性和效率,范围从小到大型数据项不等。拟议算法的代码可在https://github.com/wangsiwei2010/latefusionalignment上公开获得。
translated by 谷歌翻译
最近的作品揭示了设计损失功能的基本范式,该损失功能与骨料损失不同。单个损失衡量样本上模型的质量,而总损失结合了每个训练样本的个体损失/分数。两者都有一个共同的过程,将一组单个值集合到单个数值值。排名顺序反映了设计损失时个人价值观之间最基本的关系。此外,可以将损失分解成单个术语的合奏的可分解性成为组织损失/得分的重要特性。这项调查对机器学习中的基于等级的可分解损失进行了系统的全面审查。具体而言,我们提供了损失功能的新分类法,遵循总损失和个人损失的观点。我们确定聚合器以形成此类损失,这是集合功能的示例。我们将基于等级的分解损失组织为八类。遵循这些类别,我们回顾有关基于等级的总损失和基于等级的个人损失的文献。我们描述了这些损失的一般公式,并将其与现有的研究主题联系起来。我们还建议未来的研究方向涵盖基于等级的可分解损失的未开发,剩余和新兴问题。
translated by 谷歌翻译
聚类是一种代表性的无监督方法,广泛应用于多模式和多视图方案。多个内核聚类(MKC)旨在通过集成基础内核的互补信息来分组数据。作为代表,后期的Fusion MKC首先将内核分解为正交分区矩阵,然后从他们那里学习共识,最近实现了有希望的表现。但是,这些方法无法考虑分区矩阵内部的噪声,从而阻止了聚类性能的进一步改善。我们发现噪声可以分解为可分离的双部分,即n-noise和c-noise(空空间噪声和柱空间噪声)。在本文中,我们严格地定义了双噪声,并通过最小化新颖的无参数MKC算法提出了新颖的MKC算法。为了解决最终的优化问题,我们设计了有效的两步迭代策略。据我们所知,这是第一次研究内核空间中分区中的双重噪声。我们观察到双重噪声会污染对角线结构并产生聚类性能的变性,而C-Noise比N-Noise表现出更大的破坏。由于我们的有效机制可以最大程度地减少双重噪声,因此所提出的算法超过了最新的方法。
translated by 谷歌翻译